Skip to content

Conversation

@Allda
Copy link
Collaborator

@Allda Allda commented Jan 14, 2026

What does this PR do?

Add init container for workspace restoration

A new init container is added to the workspace deployment in case user choose to restore the workspace from backup.

By setting the workspace attribute "controller.devfile.io/restore-workspace" the controller sets a new init container instead of cloning data from git repository.

By default an automated path to restore image is used based on cluster settings. However user is capable overwrite that value using another attribute "controller.devfile.io/restore-source-image".

The restore container runs a wokspace-recovery.sh script that pull an image using oras an extract files to a /project directory.

What issues does this PR fix or reference?

#1525

Is it tested? How?

No automated tests are available in the first phase. I will add tests once I get the first approval that the concept is ok.

How to test:

  • Configure a cluster to enable backups
  • Create a new workspace and make changes in any of its files and save it.
  • Stop the workspace `kubectl patch devworkspace restore-workspace-2 --type=merge -p '{"spec": {"started": false}}'
  • Wait till it is stopped, and backup is executed for the workspace (verify the backup image exists in the registry)
  • Delete a workspace from cluster kubectl delete devworkspace restore-workspace-2
  • Add an attribute to the workspace CRD as shown below (controller.devfile.io/restore-workspace)
  • Create a workspace
  • Wait till it boots up and verify the changed file is present
kind: DevWorkspace
apiVersion: workspace.devfile.io/v1alpha2
metadata:
  labels:
    controller.devfile.io/creator: ""
  name: restore-workspace-2
spec:
  started: true
  routingClass: 'basic'
  template:
    attributes:
      controller.devfile.io/storage-type: common
      controller.devfile.io/restore-workspace: 'true'

PR Checklist

  • E2E tests pass (when PR is ready, comment /test v8-devworkspace-operator-e2e, v8-che-happy-path to trigger)
    • v8-devworkspace-operator-e2e: DevWorkspace e2e test
    • v8-che-happy-path: Happy path for verification integration with Che

What's missing:

  • integration with registry authentication ✅
  • integration with built-in OCP registry ✅

@openshift-ci
Copy link

openshift-ci bot commented Jan 14, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Allda
Once this PR has been reviewed and has the lgtm label, please assign dkwon17 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

) (*corev1.Container, error) {
wokrspaceTempplate := &workspace.Spec.Template
// Check if restore is requested via workspace attribute
if !wokrspaceTempplate.Attributes.Exists(constants.WorkspaceRestoreAttribute) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO I think it would be more readable if this check was done by the caller in devworkspace_controller.go

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example presented in the Is it tested? How? section of this PR, the example has:

controller.devfile.io/restore-workspace: 'true'

which made me assume that:

controller.devfile.io/restore-workspace: 'false'

would disable the restore functionality for the devworkspace.

Could we also check for "false" string here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted the condition and moved it to the controller itself to make the code path easier to read. I also enhanced the logic to recognize the attribute value. The new test verifies that setting a value to false skips the restore container.

Allda added 4 commits January 16, 2026 09:56
A new init container is added to the workspace deployment in case user
choose to restore the workspace from backup.

By setting workspace attribute "controller.devfile.io/restore-workspace"
the controller sets a new init container instead of cloning data from
git repository.

By default an automated path to restore image is used based on cluster
settings. However user is capable overwrite that value using another
attribute "controller.devfile.io/restore-source-image".

The restore container runs a wokspace-recovery.sh script that pull an
image using oras an extract files to a /project directory.

Signed-off-by: Ales Raszka <araszka@redhat.com>
A new tests that verifies the workspace is created from a backup. It
checks if a deployment is ready and if it contains a new restore init
container with proper configuration.

There are 2 tests - one focused on common pvc and other that have
per-workspace storage.

Signed-off-by: Ales Raszka <araszka@redhat.com>
The condition whether an workspace should be restored from workspace was
in the restore module itself. This make a reading a code more difficult.
Now the condition is checked in the controller itself and restore
container is only added when enabled.

This commit also fixes few minor changes based on the code review
comments:
- Licence header
- Attribute validation
- Add a test for disabled workspace recovery
- Typos

Signed-off-by: Ales Raszka <araszka@redhat.com>
A new config is added to control the restore container. Default values
are set for the new init container. It can be changed by user in the
config. The config uses same logic as the project clone container
config.

Signed-off-by: Ales Raszka <araszka@redhat.com>
BackupCronJob: &controllerv1alpha1.BackupCronJobConfig{
Enable: ptr.To[bool](true),
Registry: &controllerv1alpha1.RegistryConfig{
Path: "localhost:5000",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkwon17 For the purposes of the integration tests, I need a valid backup container somewhere in a registry (the container could be empty with vely low size) to be able start and successfully execute the restore container. What would be the best place to upload it?

To verify the tests works I used my local registry, but that's not an option for running it outside of the localhost.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibuziuk, could you please give me a push permission to the repo so I can push the image there? My Quay account is araszka

@rohanKanojia
Copy link
Member

@Allda : I'm facing a strange issue while testing this functionality on CRC cluster . I've tried on both amd64 and arm64 variants but face same issue. I used samples/plain-workspace.yaml for testing.

Everything goes fine till step 4, but when I create the restore backup manifest I can see devworkspace resource is created but there is no corresponding pod for it:

oc create -f restore-dw.yaml
devworkspace.workspace.devfile.io/plain-devworkspace created
oc get dw
NAME                 DEVWORKSPACE ID             PHASE     INFO
plain-devworkspace   workspace612b8ddca9ff45d5   Running   Workspace is running
oc get pods
No resources found in rokumar-dev namespace.

I had just modified the name from the restore manifest you shared:

kind: DevWorkspace
apiVersion: workspace.devfile.io/v1alpha2
metadata:
  labels:
    controller.devfile.io/creator: ""
  name: plain-devworkspace
spec:
  started: true
  routingClass: 'basic'
  template:
    attributes:
      controller.devfile.io/storage-type: common
      controller.devfile.io/restore-workspace: 'true'

Could you please check if I'm missing something?

@Allda
Copy link
Collaborator Author

Allda commented Jan 21, 2026

@Allda : I'm facing a strange issue while testing this functionality on CRC cluster . I've tried on both amd64 and arm64 variants but face same issue. I used samples/plain-workspace.yaml for testing.

Everything goes fine till step 4, but when I create the restore backup manifest I can see devworkspace resource is created but there is no corresponding pod for it:

oc create -f restore-dw.yaml
devworkspace.workspace.devfile.io/plain-devworkspace created
oc get dw
NAME                 DEVWORKSPACE ID             PHASE     INFO
plain-devworkspace   workspace612b8ddca9ff45d5   Running   Workspace is running
oc get pods
No resources found in rokumar-dev namespace.

I had just modified the name from the restore manifest you shared:

kind: DevWorkspace
apiVersion: workspace.devfile.io/v1alpha2
metadata:
  labels:
    controller.devfile.io/creator: ""
  name: plain-devworkspace
spec:
  started: true
  routingClass: 'basic'
  template:
    attributes:
      controller.devfile.io/storage-type: common
      controller.devfile.io/restore-workspace: 'true'

Could you please check if I'm missing something?

I am not sure why it doesn't work on your system. I tried the workspace you mentioned and the backup and other pods were created successfully.

~ k get deployments -n araszka
NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
workspacefba99cfc93514828   1/1     1            1           40s

~ k get pods -n araszka                                                               
NAME                                         READY   STATUS             RESTARTS      AGE
fuse-builder-pod                             0/1     ImagePullBackOff   2 (20d ago)   63d
workspacefba99cfc93514828-84b865df77-t5gwz   1/1     Running            0             51s

Are there any logs you can share or see if the workspace has any pods at the very start?

@rohanKanojia
Copy link
Member

@Allda : Okay, I'll try it again and report back

Signed-off-by: Ales Raszka <araszka@redhat.com>
@Allda Allda force-pushed the 23570-restore-workspace branch from 368a98f to 9db9b7c Compare January 21, 2026 12:47
@dkwon17
Copy link
Collaborator

dkwon17 commented Jan 21, 2026

I was able to successfully restore a real Dev Spaces project:
image

mkdir /tmp/extracted-backup
tar -xzvf /tmp/devworkspace-backup.tar.gz -C /tmp/extracted-backup

cp -r /tmp/extracted-backup/* "$PROJECTS_ROOT"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should restore only if $PROJECTS_ROOT is empty so that restoration happens only once?

Because, if not, we can run into this case:

1. Workspace starts, restores from image
2. User makes changes to project
3. User stops workspace
4. User starts workspace
5. Restore init conatiner restores $PROJECTS_ROOT
6. Changes from 2. are gone since files were overwritten

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can change that. Or is there any other mechanism or indicator that worskspace was stopped and it resumed? If so I could skip adding the restore container.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One clarifying question... How is the scenario you described handled for the clone container? Does it clone a repo again or does it skip the cloning?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any other mechanism or indicator that workspace was stopped and it resumed?

Unfortunately we don't have a robust way to detect that today

Currently for the project clone container, the cloning is skipped when the project has already been cloned and is available under /projects:

// CheckProjectState Checks that a project's configuration is reflected in an on-disk git repository.
// - Returns needClone == true if the project has not yet been cloned
// - Returns needRemotes == true if the remotes configured in the project are not available in the on-disk repo
//
// Remotes in provided project are checked against what is configured in the git repo, but only in one direction.
// The git repo can have additional remotes -- they will be ignored here. If both the project and git repo have remote
// A configured, but the corresponding remote URL is different, needRemotes will be true.
func CheckProjectState(project *dw.Project) (needClone, needRemotes bool, err error) {
if project.Attributes.Exists(ProjectSubDir) {
return checkSubPathProjectState(project)
}
repo, err := OpenRepo(path.Join(ProjectsRoot, projectslib.GetClonePath(project)))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I added a logic to skip the restore function in case a project dir is not empty.

The function handleRegistryAuthSecret is also needed for restore action.
The backup controller is not a right place for it as it can't be reused.

Moving it to separate module to be reusable.

Signed-off-by: Ales Raszka <araszka@redhat.com>

if !hasContainerComponents(workspaceTemplate) {
// Avoid adding restore init container when DevWorkspace does not define any containers
return nil, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohanKanojia for your comment: #1572 (comment)

I believe you're running into this case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm trying to literally create the same DevWorkspace manifest as provided in the description.

It started working when I provided full DevWorkspace manifest with restore attribute

Allda added 3 commits January 23, 2026 13:50
The workspace restore process now support access to registries that
require authentication. The default operator config is used to detect a
right secret name and the secret is injected to the container as a
volume.

The second part of the commit adds support for OCP build-in registry.
With this registry a user doesn't have to provide any secrets because a
workspace service account have image puller role associated using role
bindings.

Signed-off-by: Ales Raszka <araszka@redhat.com>
The image stream created by the backup mechanism can't be owned by the
workspace otherwise a deletion of the workspace also deletes the image
stream. Removing the reference keep these 2 objects unrelated.

Signed-off-by: Ales Raszka <araszka@redhat.com>
In case a workspace directory is not empty the restore process should
just log the event and skip the restoration.

This logic is added to not override data in case a workspace is resumed.

Usual flow:
- create a workspace and enable restoration
- a restore container restores data from backup
- a user updates files in workspace
- a user stops a workspace
- a user resumes a workspace
- a restore container skips the restore process due to not empty dir

Signed-off-by: Ales Raszka <araszka@redhat.com>
@Allda
Copy link
Collaborator Author

Allda commented Jan 23, 2026

@dkwon17 I extended the controller to support an authenticated registry and also natively support the default built-in OCP registry. I tested it with multiple scenarios:

  • public registry
  • a private registry (quay.io) with auth
  • a default OCP registry

@openshift-ci
Copy link

openshift-ci bot commented Jan 23, 2026

@Allda: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/v14-devworkspace-operator-e2e 8fcc091 link true /test v14-devworkspace-operator-e2e

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

defaultRoleName := common.WorkspaceRoleName()
defaultRolebindingName := common.WorkspaceRolebindingName()
if err := addServiceAccountToRolebinding(saName, workspace.Namespace, defaultRoleName, defaultRolebindingName, api); err != nil {
if err := addServiceAccountToRolebinding(saName, workspace.Namespace, defaultRoleName, defaultRolebindingName, "Role", api); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if registryAuthSecret != nil {
// Add the registry auth secret volume
devfilePodAdditions.Volumes = append(devfilePodAdditions.Volumes, corev1.Volume{
Name: "registry-auth-secret",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkwon17
Copy link
Collaborator

dkwon17 commented Jan 23, 2026

@Allda thank you for the updates,

a private registry (quay.io) with auth

I ran into some issues with this in my lab cluster where the auth secret isn't mounting properly onto the restore initcontainer, I will investigate more on Monday and provide more feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants